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Abstract 

Background: The PCR technique and its variations have been increasingly used in the clinical laboratory and recent 
advances in this field generated new higher resolution techniques based on nucleic acid denaturation dynamics. 
The principle of these new molecular tools is based on the comparison of melting profiles, after denaturation of a 
DNA double strand. Until now, the secondary structure of single-stranded nucleic acids has not been exploited to 
develop identification systems based on PCR. To test the potential of single-strand RNA denaturation as a new 
alternative to detect specific nucleic acid variations, sequences from viruses of the Totiviridae family were compared 
using a new in silico melting curve approach. This family comprises double-stranded RNA virus, with a genome 
constituted by two ORFs, ORF1 and ORF2, which encodes the capsid/RNA binding proteins and an RNA-dependent 
RNA polymerase (RdRp), respectively. 

Results: A phylogenetic tree based on RdRp amino acid sequences was constructed, and eight monophyletic 
groups were defined. Alignments of RdRp RNA sequences from each group were screened to identify RNA regions 
with conserved secondary structure. One region in the second half of ORF2 was identified and individually 
modeled using the RNAfold tool. Afterwards, each DNA or RNA sequence was denatured in silico using the 
softwares MELTSIM and RNAheat that generate melting curves considering the denaturation of a double stranded 
DNA and single stranded RNA, respectively. The same groups identified in the RdRp phylogenetic tree were 
retrieved by a clustering analysis of the melting curves data obtained from RNAheat. Moreover, the same approach 
was used to successfully discriminate different variants of Trichomonas vaginalis virus, which was not possible by 
the visual comparison of the double stranded melting curves generated by MELTSIM. 

Conclusion: In silico analysis indicate that ssRNA melting curves are more informative than dsDNA melting curves. 
Furthermore, conserved RNA structures may be determined from analysis of individuals that are phylogenetically 
related, and these regions may be used to support the reconstitution of their phylogenetic groups. These findings 
are a robust basis for the development of in vitro systems to ssRNA melting curves detection. 

Keywords: RNA secondary structure, Infectious Myonecrosis Virus, high resolution melting curve, Virus detection, 
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Background 

Despite the emergence of new techniques to nucleic acids 
investigation such as next generation sequence and array 
chips, the Polymerase Chain Reaction (PCR) and its varia- 
tions still prevail in clinical laboratories. The use of PCR 
has grown increasingly in different applications ranging 
from microorganisms detection to diagnosis of complex 
genetic diseases [1-3]. The simple implementation and the 
possibility of post-PCR analysis automation make PCR a 
great tool for high throughput analysis [3]. Since its intro- 
duction with LifeCycler®, the post PCR low resolution 
melting analysis using SYBR® Green I dye is the method 
used to confirm the reaction specificity or to detect primer 
dimer formation and other non-specific products [4]. 
Some years later, the High Resolution Melting Analysis 
(HRMA) became possible with the advent of new intercal- 
ating dyes that could be used in high concentrations with- 
out compromising the PCR efficiency [5]. The HRMA 
technique allows fast high throughput analysis of PCR 
products and reinvigorated the use of DNA melting for 
a wide range of applications, including SNP genotyping 
and DNA mapping [6-9], gene scanning [10,11], hete- 
rozygosity screening [12], species identification [13,14] 
and many others. 

The secondary structure formed by a particular nucleic 
acid molecule influences their DNA melting profile. Many 
bioinformatic tools designed to predict melting curves of 
nucleic acids are available [15-17]. Softwares that predict 
melting curves can efficiently validate regions with different 
denaturation profiles and these regions can be exploited to 
differentiate similar sequences and to define targets to post- 
PCR tests [18]. Until now, studies that attempt to develop 
molecular tools based on melting curves are restricted to 
denaturation of double-stranded DNA (dsDNA) molecules. 
Reports of secondary structures formed by a single nucleic 
acid strand, particularly single strand RNA (ssRNA), are 
focused in the determination of viral or viroid genome 
structures [19-22], noncoding RNAs (ncRNAs) and small 
interfering RNAs (iRNAs) [23-26]. 

Using carefully calculated thermodynamic parameters, 
algorithms can be used to predict the secondary structure 
of a RNA strand [27-33]. One of the most cited online 
servers that provide tools to work with RNA structures is 
the Vienna RNA Package [29]. Among the tools provided, 
RNAfold calculates the minimum free energy and predicts 
an optimal secondary structure using McCaskilfs algo- 
rithm [30]. Vienna RNA Package also provides the unique 
tool to assess ssRNA melting curves, the RNAheat soft- 
ware, which reads RNA sequences and calculates their 
specific heat in a determined temperature range, from the 
partition function by numeric differentiation [31-33]. 

The identification of RNA secondary structures is par- 
ticularly interesting when viral genomes are analyzed. Pre- 
vious studies demonstrated that conserved stem loops, 



extensive long-range interactions and small palindromic 
stem-loops generate structures that are generally associ- 
ated with viral packing capacity and regulate viral replica- 
tion [19,21,34]. However, such processes and mechanisms 
are not fully understood in Totiviridae family. Viruses 
of this family infect protozoa, fungi, insects and shrimps 
and some of these organisms have medical, zootechnical 
and agricultural importance [35-38]. Totiviridae members 
have monopartite double strand RNA (dsRNA) genomes 
organized in two ORFs. ORF1 encodes a capsid protein 
(CP) and ORF2 encodes an RNA-dependent RNA poly- 
merase (RdRp) that is highly conserved among the family 
species [39]. 

In the present study we propose that the information 
extracted from a melting curve of a single stranded RNA 
molecule allows more precise detection of nucleotide 
variations than the traditional HRMA. To demonstrate 
our hypothesis, two softwares, RNAheat and MELTSIM, 
were used to generate melting curves of nucleic acid se- 
quences from Totiviridae viruses. Melting curves gener- 
ated were used to reconstruct groups determined by a 
traditional phylogenetic analysis, based on RdRp se- 
quence alignment. Subsequently, ssRNA and dsDNA 
melting curves were compared for its potential to dis- 
criminate Trichomonas vaginalis virus isolates. Our re- 
sults indicate that the information obtained by ssRNA 
denaturation may be used as a support to the develop- 
ment of more accurate methods to detect differences in 
nucleic acid sequences. 

Results and discussion 

Phylogenetic analysis of Totiviridae family 

RNA-dependent RNA polymerases sequences are con- 
served within members of the families Totiviridae and 
Chrysoviridae [40]. Hence they were used to estimate the 
phylogenetic relationships among these viruses. Twenty 
eight RdRp aminoacid sequences referenced inTable 1 and 
two sequences referenced in Table 2 were aligned, and 
their phylogenetic relationships are shown in Figure 1A. 
Eight monophyletic groups can be defined in the obtained 
dendogram, and they were named following Liu et al clas- 
sification [40]. The groups IMNV-like, which comprises 
viruses that infect arthropods, GLV-like and ScV-like 
matched with previously described inferences [40]. Four 
new groups were retrieved: MoV-like that comprising vi- 
ruses that infect plants and fungi, TVV-like and LRV-like 
that comprises virus that infect human protozoan para- 
sites, and GaRV-like comprising fungus viruses. To ensure 
the efficiency of the analysis, relationships between TW- 
like, LRV-like and GLV-like groups and their integrants 
were determined using the sequences referenced in Table 2 
in a second phylogenetic analysis showed in Figure IB. All 
observed groups are in agreement with the classification 
proposed by International Committee on Taxonomy of 
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Table 1 Totiviridae aminoacid sequences used in this study grouped according to phylogenetic analysis 



Virus name 



Accession No. 



Abbreviation 



MoV-like 



Beauveria bassiana RNA virus 1 
Tolypocladium cylindrosporum virus 1 
Botryotinia fuckeliono totivirus 1 
Helminthosporium victoriae virus 190S 
Chalara elegans RNA virus 1 
Helicobosidium mompo totivirus 1-17 
Magnoporthe oryzae virus 1 

Infectious myonecrosis virus 
Tianjin totivirus 
Omono river virus 

Drosophilo melonogoster totivirus SW-2009a 
Armigeres subolbotus virus SaX06-AK20 

Piscine myocarditis virus AL V-708 

Giordio conis virus from China (2883-5981)* 

Giardia lamblia virus (2880-5978)* 

Blueberry latent virus isolate AR (936-3332)* 
Southern tomato virus isolate Mexico- 1 '(1039-3327)* 
Zygosaccharomyces bailii virus Z 

Ustilago maydis virus HI (735-6002)* 
Saccharomyces cerevisiae virus L-BC (La) 
Saccharomyces cerevisiae virus L-A 
Black raspberry virus F 
Tuber aestivum virus 1 

Sphaeropsis sapinea RNA virus 2 
Coniothyrium minitans RNA virus 
Epichloe festucae virus 1 
Gremmeniella abietina RNA virus 12 

Eimeria brunetti RNA virus 1 



IMNV-like 



GLV-like 



ZbV-Z like 



ScV-like 



GaRV-like 



Other sequence 



CCC42235 
YP_004089630 
YP_001 109580 
NP_6 19670 
YP_024728 
NP_898833 
YP_1 22352.1 

AHY1 8670.1 

AFE02920.1 

BAJ21511.1 

YP_003289293.1 

YP_003934934.1 

YP_004581 250.1 

DQ238861.1 

NC_003555.1 

HM029248.1 

EF442780.1 

NP_624325.1 

NC_003823.1 

NP_042581.1 

NP_620495.1 

YP_001497151.1 

ADQ54106.1 

AAD1 1603.1 
YP_392467.1 
CAK02788.1 
AAT48885.1 

NP_1 08651 



BeauV 

TcV-1 

BotryV 

HvV-190S 

ChalEIV 

HmV1-17 

MoV-1 

IMNV 
TianV 
ORV 

DmV-SW-2009a 
AsV-SaX06-AK20 

PMV-AL V-708 

GCV 

GLV2 

BLV 
STV 
ZbV-Z 

UmV-H1 
ScV L-BC 
ScV L-A 
BRV-F 
TaV-1 

SphaeroV 
CmRV 
EpiFesV 
GaRV-L2 

EbRV-1 



^Accession numbers correspond to nucleotide sequences of complete genomes. Numbers in brackets correspond to first and last nucleotides of RdRp coding sequences. 



Viruses (ICTV) [41]. GLV-like comprises viruses of the 
genus Giardiavirus and ScV-like comprises viruses of the 
genus Totivirus. The genus Victorivirus includes two gro- 
pus, MoV-like and GaRV-like. The genera Leishmania- 
virus and Trichomonasvirus include groups LRV-like and 
TW-like respectively. IMNV-like group appears as less 
derived group near to GLV-like and does is not classified 
by ICTV. The Zygosaccharomyces bailii virus (ZbV-Z) and 



two other related viruses isolated from plants and fungus 
clustered together to form a ZbV-Z-like clade, on a basal 
branch of the phylogenetic tree (Figure 1A). Indeed, this 
group was formerly referred as a primitive, less derived 
group, distantly related to Totiviruses, and includes virions 
with distinct genomic organization from this family. A 
new family Amalgamaviridae has been proposed to ac- 
commodate these three viruses [40] . 
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TW4 



TW3 



TW2 



Table 2 Aminoacid sequences of Trichomonasvirus, Leishmaniavirus and Giardiavirus used in this study grouped 
according to phylogenetic analysis 

Virus name 

Trichomonas vaginalis virus 4 strain TVV4-1 
Trichomonas vaginalis virus 4 strain TVV4-OC3 
Trichomonas vaginalis virus 4 strain TVV4-OC5 

Trichomonas vaginalis virus 3 strain TVV3-UR1 
Trichomonas vaginalis virus 3 strain TVV3-OC5 
Trichomonas vaginalis virus 3 strain TVV3-OC3 
Trichomonas vaginalis virus 3 

Trichomonas vaginalis virus 2 strain TVV2-OC3 
Trichomonas vaginalis virus 2 strain TVV2-UR1 
Trichomonas vaginalis virus 2 strain TVV2-OC5 
Trichomonas vaginalis virus II 
Trichomonas vaginalis virus 2 isolate C76 
Trichomonas vaginalis virus 2 isolate C351 

Trichomonas vaginalis virus 
Trichomonas vaginalis virus 1 isolate C344 
Trichomonas vaginalis virus 1 strain TVV1-UH9 
Trichomonas vaginalis virus 1 strain TVV1-0C4 
Trichomonas vaginalis virus 1 strain TVV1-0C3 
Trichomonas vaginalis virus 1 strain TVV1-UR1 
Trichomonas vaginalis virus 1 strain TVV1-0C5 

Leishmania RNA virus 2 - 1 
Leishmania RNA virus 1 - 4 
Leishmania RNA virus 1 - 1 

Giardia canis virus from China 
Giardia lamblia virus 
Giardia lamblia virus 



TW1 



LRV-like 



GLV-like 



Accession no. 

AED99796.1 
AED99794.1 
AED99798.1 

AED99800.1 
AED99804.1 
AED99802.1 
NP_659390.1 

AED99808.1 

AED99806.1 

AED99810.1 

NP_624323.2 

AET81014.1 

AET81016.1 

NP_620730.2 

AET81012.1 

AED99814.1 

AED99818.1 

AED99816.1 

AED99812.1 

AED99820.1 

NP_043465.1 
NP_61 9653.1 
NP_041 191.1 

ABB36743.1 
AAM77694.1 
NP_620070.1 



Abbreviation 

TW4-1 
W4-OC3 
W4-OC5 

W3-UR1 
W3-OC5 
W3-OC3 

Trichomonasvirus_3 

W2-OC3 
W2-UR1 
W2-OC5 

TrichomonasvirusJI 
W2-C76 
W2-C351 

TrichomonasvirusJ 
W1-C344 
W1-UH9 
W1-0C4 
W1-0C3 
TW1-UR1 
W1-0C5 

LRV 2-1 
LRV 1-4 
LRV 1-1 

GCV 
GLV1 
GLV2 



Identification of conserved RNA secondary structures and 
melting curves generation 

In HRMA, nucleotide variations between two PCR prod- 
ucts are detected comparing their melting curves. Al- 
though this approach has been successfully used to identify 
sequence polymorphisms [6-8], and to discriminate bacter- 
ial strains and viruses variants [13,14,42], it can be rather 
inconclusive in some cases. High-resolution instruments 
and expensive dyes are required to detect punctual muta- 
tions or in situations where is necessary to detect multiple 
mutations in a same sequence [43,44]. Considering that 
ssRNA melting curve is closed related to the secondary 



structure assumed by a ssRNA molecule, we decided to in- 
vestigate if a melting curve of a ssRNA is more informative 
than a melting curve generated from a dsDNA. For this, 
RNA sequences from Totiviridae viruses coding for RdRp 
proteins were inspected in order to identify regions with 
conserved secondary structures. Conserved regions were 
selected to avoid major structural variation between the se- 
quences. Initially, RNA sequences referenced in Tables 3 
and 4 were screened but conserved RNA structures com- 
mon to all sequences were not found. Interestingly, the 
alignment of the sequences from each group individually 
revealed regions with high probability (>90%) to form 
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Figure 1 Phylogenetic relationships between Totiviridae family members. Trees were calculated from an alignment of RdRp aminoacid 
sequences from representative members of the Totiviridae family, using Bayesian inference. The IDs of the sequences in trees A and B are shown 
in Tables 1 and 2. The numbers in branch nodes indicate posterior probabilities. The right curly brackets indicate the groups identified in this 
study, named in accordance with Liu et al. [40] and de colors represent the genera in according to ICTV. 
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Table 3 Totiviridae nucleotide sequences used in this study grouped according to phylogenetic analysis 



Virus name 



Acession code (Gl) 



Abbreviation 



MoV-like 



Beauveria bassiana RNA virus 7 (2672-5 1 76) 
Tolypocladium cylindrosporum virus 1 (2604-5 1 26) 
Botryotinia fuckeliono totivirus 1 (2631-5147) 
Helminthosporium victoriae virus 190S (2605-5112) 
Chalara elegans RNA virus 1 (2619-4067) 
Helicobosidium mompo totivirus 1-17 (2563-5100) 
Magnaporthe oryzae virus 1 (2818-5316) 

Penoeid shrimp infectious myonecrosis virus (5241-7490) 
Tionjin totivirus (5319-7535) 
Omono river virus (5202-7535) 

Drosophila melanogaster totivirus SW - 2009a (4841-6706) 
Armigeres subalbatus virus SaX06-AK20 (5145-7430) 

Piscine myocarditis virus AL V-708 (31 14-5294) 
Giardia can is virus from China (2883-5981) 
Giardia lamblia virus (2880-5978) 

Blueberry latent virus isolate AR (936-3332) 
Southern tomato virus isolate Mexico- 7(1039-3327) 
Zygosaccharomyces bailii virus Z (11 06-3037) 

Ustilago maydis virus HI 

Saccharomyces cerevisiae virus L-BC (La) (1970-4561) 
Saccharomyces cerevisiae virus L-A (2351-4546) 
Black raspberry virus F (2565-5009) 
Tuber aestivum virus 1 (2169-4556) 

Sphaeropsis sapinea RNA virus 2 (2658-5135) 
Coniothyrium minitans RNA virus (2386-4875) 
Epichloe festucae virus 1 (2568-5051) 
Gremmeniella abietina RNA virus L2 (2599-5076) 

Eimeria brunetti RNA virus 7(2667-5321) 



IMNV-like 



GLV-like 



ZbV-Z Like 



ScV-like 



GaRV-like 



Other sequence 



345108726 

315573168 

134141995 

124484600 

48696977 

33867950 

54193767 

459680256 
380715048 
307933349 
268053723 
309259994 

336042307 

78217291 

20143439 

308097100 
133776995 
20889374 

20564172 

9627980 

20428567 

157939583 

312233874 

3808226 
78762702 
94536498 
49036574 

NP_1 08651 



BeauV 

TcV-1 

BotryV 

HvV-190S 

ChalEIV 

HmV1-17 

MoV-1 

IMNV 
TianV 
ORV 

DmV-SW-2009a 
AsV-SaX06-AK20 

PMV-AL V-708 

GCV 

GLV2 

BLV 
STV 
ZbV-Z 

UmV-H1 
ScV L-Bc 
ScV L-A 
BRV-F 
TaV-1 

SphaeroV 
CmRV 
EpiFesV 
GaRV-L2 

EbRV-1 



Accession codes correspond to nucleotide sequences of complete genomes. Numbers in brackets correspond to first and last nucleotides of RdRp coding sequences. 



conserved RNA structures in groups IMNV-like, GaRV- 
like and GLV-like. Members of the MoV-like group 
showed conserved regions only when analyzed in sub- 
groups, BotryV, TcV-1 and HvV-190S showed regions with 
conserved RNA structure in the second half of ORF2 when 
taken together. The same was observed to MoV-like mem- 
bers BeauV and MoV-1 when analyzed individually (data 
not shown). The groups ZbV-Z-like, ScV-like, TW-like 
and LRV-like do not show RNA conserved regions with 



high RNAz score, nevertheless, one conserved region of 
each group could be selected manually from alignments 
(data not show). It is clear that the similarity between se- 
quences increases the chance of finding regions with 
conserved RNA structure. In agreement with phylogenetic 
trees showed in Figure 1, individuals that share a secondary 
RNA structure belongs to groups with shorter branches. 

RNA secondary structures of the conserved regions 
found in groups IMNV-like, GLV-like and GaRV-like were 
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Table 4 Nucleotide sequences of Trichomonasvirus, Giardiavirus and Leishmaniavirus used in this study grouped 
according to phylogenetic analysis 



Virus name 




Acession code (Gl) 


Abbreviation 




TW4 






Trichomonas vaginalis virus 4 strain TVV4 -1 (2534-4782) 




332015871 


TW4-1 


Trichomonas vaginalis virus 4 strain TVV4-OC3 (2535-4783) 




332015868 


W4-OC3 


Trichomonas vaginalis virus 4 strain TVV4-OC5 (2534-4782) 


TW3 


332015874 


W4-OC5 


Trichomonas vaginalis virus 3 strain TVV3-UR1 (2448-4693) 




332015877 


W3-UR1 


Trichomonas vaginalis virus 3 strain TVV3-OC5 (2445-4690) 




332015883 


W3-OC5 


Trichomonas vaginalis virus 3 strain TVV3-OC3 (2449-4694) 




332015880 


W3-OC3 


Trichomonas vaginalis virus 3 (2645-4690) 


TW2 


21450040 


Trichomonasvirus_3 


Trichomonas vaginalis virus 2 strain TVV2-OC3 (2380-4607) 




332015889 


W2-OC3 


Trichomonas vaginalis virus 2 strain TVV2-UR1 (2379-4606) 




332015886 


W2-UR1 


Trichomonas vaginalis virus 2 strain TVV2-OC5 (2378-4605) 




332015892 


W2-OC5 


Trichomonas vaginalis virus II (2302-4605) 




20889358 


TrichomonasvirusJI 


Trichomonas vaginalis virus 2 isolate C76 (2317-4620) 




357529890 


W2-C76 


Trichomonas vaginalis virus 2 isolate C351 (2314-4617) 


TW1 


357529893 


W2-C351 


Trichomonas vaginalis virus (2308-4578) 




20564174 


TrichomonasvirusJ 


Trichomonas vaginalis virus 1 isolate C344 (2316-4578) 




357529887 


W1-C344 


Trichomonas vaginalis virus 1 strain TVV1-UH9 (2353-4615) 




332015898 


W1-UH9 


Trichomonas vaginalis virus 1 strain TVV1-OC4 (2353-4615) 




332015904 


W1-OC4 


Trichomonas vaginalis virus 1 strain TVV1-OC3 (2355-4617) 




332015901 


W1-OC3 


Trichomonas vaginalis virus 1 strain TVV1-UR1 (2354-4616) 




332015895 


TW1-UR1 


Trichomonas vaginalis virus 1 strain TVV1-OC5 (2351-4613) 


LRV-like 


332015907 


W1-OC5 


Leishmania RNA virus 2-1 (2858-5191) 




9628596 


LRV 2-1 


Leishmania RNA virus 1-4 (2605-5241) 




20153346 


LRV 1-4 


Leishmania RNA virus 1-1 (2612-5236) 


GLV-like 


9626920 


LRV 1-1 


Giardia can is virus from China (2883-5981) 




78217291 


GCV 


Giardia lamblia virus (2880-5978) 




20143439 


GLV2 


Giardia lamblia virus (2880-5978) 




21780360 


GLV1 



Accession codes correspond to nucleotide sequences of complete genomes. Numbers in brackets correspond to first and last nucleotides of RdRp coding sequences. 



predicted using the software RNAfold. RNA fragments 
that show conserved RNA secondary structures in 
IMNV-like group are indicated in Figure 2 column A. 
The respective models generated from each sequence 
are showed in Figure 2 column B. These sequences were 
also used to perform a in silico melting curve analysis 
using softwares RNAheat and MELTSIM, in order to 
obtain ssRNA melting curves (Figure 2 column C) and 
dsDNA melting curves (Figure 2 column D). The results 
of the same analysis from groups GLV-like and GaRV-like 
are showed in Additional file 1: Figure SI and Additional 
file 2: Figure S2 respectively. Is interesting to observe that, 



in all cases, ssRNA melting curve presents higher variation 
than the profile generated by denaturation of dsDNA. This 
variation is possibly due to the presence of "bubbles" or 
"hairpins" formed as result of regions that not have perfect 
base pair complementarities. These regions may comprise 
several small pieces that present different melting temper- 
atures. When dsDNA is used, the melting curve variation 
is generated only due to differences in the number of 
hydrogen bonds between the strands, which can be caused 
by nucleotide mispairing. This subtle variation in dsDNA 
melting curve can be detected only using more sensitive 
and expensive methods. Denaturation profile generated by 
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Figure 2 Regions with conserved RNA secondary structures and their respective melting curves. This figure corresponds to analysis of 
IMNV-like group sequences. (A) Indication of regions with conserved secondary structure inside RdRp coding sequences, identified using RNAz. 
(B) Minimum free energy models calculated using RNAfold corresponding to each conserved region identified by RNAz. Structures are colored 
according to base-pairing probabilities. Red color denotes the high probability and purple denotes low probability of a given base is paired or 
not. For unpaired regions the color denotes the probability of being unpaired. (C and D) Melting curves calculated from conserved regions using 
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ssRNA, as a result of the loss of its secondary structure, 
reflects more intense variations in nucleotide sequence 
unambiguously. These variations are more pronounced if 
the number of paired regions interspersed with non 
complementary regions is high. This can be easily ob- 
served when comparing the graphs of columns C and D 
in Figure 2. Is possible to distinguish five profiles in col- 
umn C visually, but is not possible to do it comparing 
profiles that are in column D. 

Clustering groups using melting curves 

To confirm that information extracted from a ssRNA 
melting curve is more detailed than its correspondent 
dsDNA melting curve, clustering analyses were performed 
using melting curves from each ssRNA and dsDNA se- 
quences of groups IMNV-like, GaRV-like and GLV-like. 
The curves were compared per group and clustered using 
R [45]. The results are represented as dendograms in 
Figure 3 and in Additional file 3: Figure S3. The relation- 
ships between individuals are determined exclusively by 
the similarity between the melting curves generated by the 
programs RNAheat and MELTSIM. The groups obtained 
from R analyses (Figure 3) were compared to groups ob- 
tained in phylogenetic analysis. It was surprising that the 



Groups 



IMNV-like and GaRV-like groups could be perfectly re- 
constructed by the clustering of the RNAheat melting 
curves data, but not by the clustering based on MELTSIM 
melting curves. In these two cases, the analysis using 
ssRNA melting curves showed more resolution than the 
analysis using dsDNA melting curve. In other words, these 
results strongly indicate that ssRNA melting curve is a 
good source of information about the nucleic acid se- 
quence. Additional tests using the conserved sequences 
manually selected from the other groups confirm that 
the resolution of dendrograms generated from RNAheat 
curves is never less than the resolution of dendograms 
generated from MELTSIM curves (data not show). 

It is already known that the formation of secondary 
structures in DNA can exerts significant influence in the 
molecule functions during DNA replication, transcription 
or translation. These secondary structures may vary within 
the molecule or when DNA is transcribed to RNA in ac- 
cording to cellular context involved. Considering this fact, 
is perfectly plausible that a given nucleic acid sequence 
may suffer different selective pressures in according with 
variations of it conformation in different stages of its "life 
cyle". In single stranded RNA viruses, the secondary struc- 
ture of RNA could be selected by a large number of 
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Figure 3 Cluster analysis and dendograms of groups IMNV-like and GaRV-like. Melting curves generated for each conserved RNA sequence 
in a same group were compared and clustered using a statistical inference. The proximity between individuals of groups indicated in the column 
(A) is due exclusively to the similarity between the melting curves generated in silico. Columns (B) and (C) shows the dendograms calculated 
from curves generated by RNAheat and MELSTSIM for the members of each group. 
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factors acting at the same time, including the compacta- 
tion of the genetic material into capsid Therefore, we 
opted to eliminate any noise that could compromise the 
analysis of RNA conserved secondary structures and en- 
sure that natural selection would be acting mainly on the 
structure detected by RNAz. Due to this fact the Totiviri- 
dae family seems a perfect model During all replication 
steps the genetic material of Totiviridae remains as RNA 
and the formation of RNA secondary structures occur 
only when RNA is being replicated. This factor can be 
decisive for the perfect reconstruction of phylogenetic 
groups comparing secondary structures of RNA. 

Potential of single strand melting curve to pathogen 
identification 

Whereas the analysis of single strand denaturation enables 
a higher resolution to phylogenetic groups reconstruction, 
it is expected to be also more efficient in distinguishing in- 
dividuals within the same group. To confirm this, a phylo- 
genetic analysis of sequences from a large number of 
members of TW-like group was performed (Figure IB). 
The analysis of different variants of Trichomonas vaginalis 
virus, revealed four five sub-groups called TW2, TW3, 
TW4 and TW1, all belonging to the group TW-like and 
to genus Trichomonasvirus. The sub-group TW1 was se- 
lected to generate melting curves in silico. An alignment of 
RdRp RNA sequences was used for RNAz screening. This 
analysis revealed one region with conserved RNA structure 
shared by all viruses of this group in the second half or 
RdRp RNA sequence. Then, two regions were selected, a 
non-conserved region and the conserved region detected 
by RNAz (Figure 4A and 4B, respectively). These regions 
were used to generate melting curves using RNAheat and 
MELTSIM (Figure 4C and 4D respectively). It was clear 
that the melting curves generated from ssRNA are more 
informative than the curves generated by denaturation of 
dsDNA. Observing the curves generated by RNAheat in 
both sets of melting curves is possible to differentiate seven 
Trichomonasvirus variants. The discrimination of each 
virus is more difficult if the graphs generated by MELTSIM, 
because the variation in the melting curves occurs in a 
restricted temperature range. 

Conclusions 

The results presented here are a strong indication that 
the ssRNA melting curves are more informative than 
dsDNA melting curves. In addition, they demonstrate that 
common RNA conserved regions may be determined 
from analysis of individuals that are phylogenetically re- 
lated, and that these regions may be used to support the 
reconstitution of their phylogenetic groups. These findings 
are a robust basis for the development of in vitro systems 
to ssRNA melting curves detection. 



Methods 

Data acquisition and phylogenetic analysis 

The nucleotide and amino acid sequences from Totiviri- 
dae viruses were retrieved from public repositories such 
as GenBank [www.ncbi.nlm.nih.gov] and UniProt [www. 
uniprot.org]. Sequences were aligned using TCOFFEE 
and MCOFFEE algorithms [46] using default parameters, 
and manually edited using Jalview v. 2.8 [47]. ORFs and 
protein conserved domains identification were performed 
using ORF finder and NCBI conserved domain database 
(CDD), respectively. The RdRP sequence from Micromo- 
nas pusilla virus (Reoviridae family; Accession number 
YP654545) was used as outgroup due to its higher prox- 
imity and similarity to the family Totiviridae. The RdRP 
sequences were then aligned at the amino acid level, using 
the program MAFFT v. 6.85 [48,49] with the L-INS-I par- 
ameter, gap opening penalty 1.53 and offset value 0.1, 
guided by the structural alignment from protein family 
pfam02123, present in the Conserved Domains Database 
(CDD) [50] . Then, they were re-aligned using the program 
Muscle [51]. Afterwards, the best-fit amino acid substitu- 
tion model was estimated using ProtTest v.3.2 [52] and 
the dendograms were calculated based on a Bayesian ana- 
lysis using MrBayes 3.1.2 [53,54] and BEAST v.1.8 [55]. 
All indels and non-informative sites (missing data) in the 
alignment were treated as partial deletion, with a cutoff of 
75%, to avoid potentially ambiguous regions in topologies. 
The Bayesian inferences were conducted using three inde- 
pendent runs, with fixed LG or WAG model, gamma 
distributed rates among sites and fixed amino acid fre- 
quencies. Each Markov Chain was initiated with a random 
tree and run for 10 6 generations, sampled every 100 gen- 
erations, and a consensus tree was estimated by using a 
burning in of 1,000,000 trees. The convergence of the sim- 
ultaneous runs was assessed using the Tracer tool v. 1.5 
[56], in order to evaluate the statistic support and robust- 
ness of the bayesian analysis. The trees generated by the 
programs were edited in the program FigTree [56]. 

RNA secondary structure prediction 

Conserved RNA secondary structures were detected from 
TCOFFEE multiple alignments of RdRp RNA sequences 
(Tables 3 and 4), using the RNAz software, provided by 
Vienna RNA Web Services [57]. This tool detects a con- 
sensus secondary structure for an alignment based on 
thermodynamic stability and structural conservation. A 
normalized measure of thermodynamic stability is com- 
puted by comparing the minimal free energy (MFE) of a 
native sequence to the MFEs of a large number of random 
sequences of the same length and base composition. Then, 
a z-score is calculated from the relation z = (m -\a )/a, 
where ft and a are the mean and standard deviations, re- 
spectively, of the MFEs of the random samples [58]. Nega- 
tive z-scores indicate that a sequence is more stable than 
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Figure 4 Differentiating members of the subgroup TVV1 using in silico melting curves. A region with variable RNA secondary structure (A) 
and a region with conserved RNA structure (B) were obtained from the alignment of 0RF2 of all members of the group using the software RNAz. 
The curves were generated by RNAheat and MELTSIM to conserved regions (C) and variable regions (D). Each denaturation curve is marked with a 
different color: Dark blue lines to TW1-C344; orange lines to TW1-0C3; yellow lines to TW1-OC4; green lines to WV1-OC5; dark red lines to 
TW1-UR1; light blue lines to WV1-UH9 and dark green lines to TW1J. 



expected by chance. The structural conservation is pre- 
dicted using the RNAalifold approach [59] . The secondary 
structures were then calculated using the sequences se- 
lected from the RNAz output using RNAfold software pro- 
vided by Vienna RNA Package [57]. RNAfold reads RNA 
sequences, calculates their MFE structure and free energy. 
The -p option was used to compute the partition function 
(PF) and base pairing probability matrix, as well as the 
overall free energy of the thermodynamic ensemble. RNA- 
fold produces PostScript files with plots of the resulting 
secondary structure graph and a dot plot of the base 
pairing matrix. Default parameters were used to generate 
the interactive RNA structure plots. 



Melting curve analysis 

The dsDNA melting curves were estimated using the 
MELTSIM software, which generates derivative profiles. 
In the model used by this software, proposed by Blake 
et al. [15], the loop entropy has been appended in a 
one-dimensional Ising lattice [60-62]. By default, the 
program starts the simulation at 60°C (Tl), increasing 
the temperature in every 0.050 degrees, until it reaches 
100°C (T2). The single strand RNA melting curves were 
estimated using the RNAheat software [31]. This pro- 
gram reads RNA sequences and calculate their specific 
heat in a predetermined temperature range, from a par- 
tition function by numeric differentiation that describes 
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the statistical properties of a system in thermodynamic 
equilibrium. The temperature dependence of the parti- 
tion function gives information about the secondary 
structure melting behavior. The overwhelming majority 
of configurations are in the unfolded state and the high 
temperature ensemble is unfolded. According to refer- 
ence point proposed by McCaskill [30] for the entropy 
of zero for an unfolded chain, the partition function 
must decrease toward one at high temperature and the 
specific heat reflects the occurrence of any structural 
transitions as the temperature increases. The result is 
written as a list of temperature degrees in °C versus spe- 
cific heat in Kcal/(Mol * K) [31]. The results calculated 
from 0 to 100°C were plotted using R [45]. 

Statistical and grouping analysis 

Based on the melting denaturation scores, the melting 
curves were clustered using a hierarchical cluster analysis, 
using R [45]. This technique was used to identify the mu- 
tually exclusive groups that could be obtained based in the 
sample, considering only the similarities or differences 
between them. In this procedure, dendograms with the 
clusters were identified using the single linkage (nearest 
neighbor) method with the measure of Euclidean distance 
squared. This algorithm takes the two objects with the 
smallest distance and clusters them in the first group. 
Then, it takes the next object with the smallest distance 
and this third object is clustered with the first group, being 
included in the group a new group with two objects is 
obtained. This process keeps going until all objects are 
allocated to a group. The nucleotide sequences from the 
identified regions with conserved secondary structures 
were aligned in MEGAS [63] using the MUSCLE algo- 
rithm. Each alignment was used in a neighbor- joining 
grouping analysis, using Maximum-composite likelihood 
distance and 500 bootstrap replications. The obtained den- 
dograms were visually compared to the ones from hier- 
archical cluster analyses, based on the single and double 
strand DNA melting denaturation cores. 

Availability of supporting data 

All supporting data are included in Additional files. 

Additional files 



Additional file 2: Figure S2. Regions with conserved RNA secondary 
structures identified in GaRV-like group and their respective melting 
curves. (A) Regions with secondary structures identified using RNAz 
software, from the alignment of ORF2 RNA sequences of GaRV-like group 
members. (B) Secondary structure calculated using RNAfold, corresponding 
to each conserved region identified by RNAz. (C) Melting curves calculated 
from the conserved region, using the software RNAheat which considers 
ssRNA denaturation. (D) Melting curves calculated from the conserved 
region, using the software MELT5IM which considers dsDNA denaturation. 

Additional file 3: Figure S3. Cluster analysis and dendogram of 
GLV-like group. The curves generated for each sequence were compared 
and clustered using a statistical inference. The proximity between 
individuals of groups indicated in the column (A) is due exclusively to 
the similarity between the melting curves generated in silico. Columns (B) 
and (C) shows the dendograms calculated from the curves generated by 
the programs RNAheat and MEL5TSIM for the members of GLV group. 
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